Learning Policies for Contextual Submodular Prediction

نویسندگان

Stéphane Ross

Jiaji Zhou

Yisong Yue

Debadeepta Dey

J. Andrew Bagnell

چکیده

Many prediction domains, such as ad placement, recommendation, trajectory prediction, and document summarization, require predicting a set or list of options. Such lists are often evaluated using submodular reward functions that measure both quality and diversity. We propose a simple, efficient, and provably near-optimal approach to optimizing such prediction problems based on noregret learning. Our method leverages a surprising result from online submodular optimization: a single no-regret online learner can compete with an optimal sequence of predictions. Compared to previous work, which either learn a sequence of classifiers or rely on stronger assumptions such as realizability, we ensure both data-efficiency as well as performance guarantees in the fully agnostic setting. Experiments validate the efficiency and applicability of the approach on a wide range of problems including manipulator trajectory optimization, news recommendation and document summarization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Policies for Contextual Submodular Prediction - Supplementary Material

Lemma 2. Let S be a set, and f a monotone submodular function defined on lists of items in S. Let A,B be any lists of items from S. Denote Aj the list of the first j items in A, U(B) the uniform distribution on items in B and define j = Es∼U(B)[f(Aj−1 ⊕ s)] − f(Aj), the additive error term in competing with the average marginal benefits of the items in B when picking the j item in A (which coul...

متن کامل

Online Submodular Set Cover, Ranking, and Repeated Active Learning

We propose an online prediction version of submodular set cover with connections to ranking and repeated active learning. In each round, the learning algorithm chooses a sequence of items. The algorithm then receives a monotone submodular function and suffers loss equal to the cover time of the function: the number of items needed, when items are selected in order of the chosen sequence, to ach...

متن کامل

Predicting Contextual Sequences via Submodular Function Maximization

Sequence optimization, where the items in a list are ordered to maximize some reward has many applications such as web advertisement placement, search, and control libraries in robotics. Previous work in sequence optimization produces a static ordering that does not take any features of the item or context of the problem into account. In this work, we propose a general approach to order the ite...

متن کامل

Knapsack Constrained Contextual Submodular List Prediction with Application to Multi-document Summarization

We study the problem of predicting a set or list of options under knapsack constraint. The quality of such lists are evaluated by a submodular reward function that measures both quality and diversity. Similar to DAgger (Ross et al., 2010), by a reduction to online learning, we show how to adapt two sequence prediction models to imitate greedy maximization under knapsack constraint problems: CON...

متن کامل